What does it have to do with physics?
(very brief:)
-Physics is about the gathering of:
a) Performing experiments to see if we can get a ruke of thumb to solve the problem: “When I do this experiment(s), what will the result(s) be?”
b) Creating a rule of thumb- so far an art.
c) Doing more experiments to check if that rule of thumb is correct.
d) Studying this rule to obtain some philosophical meaning.
-Peter Norvig: d) is meaninngless
-Galileo: if I roll a ball down a slope, (actually a slope with no friction) when it is sloped at 60 degrees, how low will it fall after time t?
-He used his pulse to time his experiments, and he got tables like this, below showing the time vs the distance it travelled: Galileo%20table.JPG (This picture is actually from the Feynman lectures) -After doing it many, many times, and using his pulse as the clock, he got a rule of thumb.
-They checked it with many experiments.
Today it is as:
S1. Get time t
S2. Calculate distance D=4.2435×t².
S3. D is the distance in SI units (metres)

#S1:
t= 0.4 #Change this and click shift+enter to find D
#S2:
D= 4.2435*t**2
#S3:
print( "Distance=", D )

Distance= 0.6789600000000001

-The number "4.2435" was actually specific to when the slope is "sloped" at 60 degrees to the ground, and that it has no friction. If the slope was slanted at any angle θ to the ground, then the rule of thumb would be only slightly different:
S1. Get time t and θ
S2. Calculate distance D=4.9×sin θ×t².
S3. D is the distance in metres

import math

#S1:
t= 0.4 #Change this and click shift+enter to find D for different values of t
theta= math.pi/3 #If you want it for 60 degrees
#S2:
D= 4.9*math.sin(theta)*t**2
#S3:
print( "Distance=", D )

Distance= 0.6789639165670001

(If you set theta to π/3 here you'll get the same answer as above.)
-Further, the number 4.9 is equal to (1/2)g, where g is the "acceleration due to gravity" on Earth.
-For θ=90 degrees, it's just a ball falling

import matplotlib.pyplot as plt
import math

T=[]
D_T=[]

for i in range(20):
    T.append(i)
    D_T.append(4.9*i**2)

plt.plot( T, D_T )

[<matplotlib.lines.Line2D at 0xbb5d370>]

-Looks like the graph above
-It's a parabola

Higher orders

-Rate of change in position: velocity
-Rate of change in velocity: acceleration
-Rate of change in acceleration: jerk
-Rate of change in jerk: jounce
....etc.
-In general, D(t)= D₀ + Veloctiy₀t + Acceleration₀(t²/2) + Jerk₀(t³/6) + Jounce₀(t⁴/24) + ....

Was what Galileo did art or science? Is there a rule of thumb to look at the table ("Galileo's" table, but not really Galileo's), and get the rule of thumb we wrote earlier?
(Week-1,2, briefly:)
Galileo%20table.JPG -Galileo is a genius, and he DID NOT use any rule of thumb to get the answer.
-But this question is central to what is known as machine learning- for some problem, or a certain class of problems, is there a rule of thumb to get us the rule of thumb to solve those problems?
-There are several rules of thumb to get these today, but even as far back as more than 200 years ago, Lagrange invented a rule of thumb to create a rule of thumb which solves the problem: given a table like this,:
what rule of thumb will solve the problem: What is D(t), given t? It's called Lagrange interpolation.

Lagrange interpolation
L1. Read each value of time T and the distance at that time, D(T), and compute D(t)/(T−t₀)(T−t₂)(T−t₂)....(T−t_last) where the t_i's are the other values (not including the one you're looking at now).
L2. Write this down on the right of the expression: +(t−t₀)(t−t₂)(t−t₂)....(t−t_last)
(again, skipping the current one- T). L3. Repeat steps (1) and (2) for all the values of time T

The expression you have written down fully is a formula (and, of course, also a rule of thumb) to get D(t) if you are given t.

#Lagrange interpolation
t_vs_D= { 0:0, 1:16, 2:64, 3:44, 4:256, 5:400, 6:576 }#This is from Galileo's table; change it and try it out for more

for T, DT in t_vs_D.items():
    #Compute (T−t0)(T−t2)(T−t2)....(T−tlast) 
    c= DT
    for ti in t_vs_D:
        if ti is not T:
            c= c/(T-ti)
    #Write down expression
    print( "  +   ", end=" " )
    for ti in t_vs_D:
        if ti is not T:
            print( "(t-",ti,")", end=" " )
    print( ".",c )

  +    (t- 1 ) (t- 2 ) (t- 3 ) (t- 4 ) (t- 5 ) (t- 6 ) . 0.0
  +    (t- 0 ) (t- 2 ) (t- 3 ) (t- 4 ) (t- 5 ) (t- 6 ) . -0.13333333333333333
  +    (t- 0 ) (t- 1 ) (t- 3 ) (t- 4 ) (t- 5 ) (t- 6 ) . 1.3333333333333333
  +    (t- 0 ) (t- 1 ) (t- 2 ) (t- 4 ) (t- 5 ) (t- 6 ) . -1.222222222222222
  +    (t- 0 ) (t- 1 ) (t- 2 ) (t- 3 ) (t- 5 ) (t- 6 ) . 5.333333333333333
  +    (t- 0 ) (t- 1 ) (t- 2 ) (t- 3 ) (t- 4 ) (t- 6 ) . -3.3333333333333335
  +    (t- 0 ) (t- 1 ) (t- 2 ) (t- 3 ) (t- 4 ) (t- 5 ) . 0.7999999999999999

And there you have it! If you substitute t= 1, or 2 , or 3 or anything, in this polynomial, you will get the same value as in the table. (Only in this case, there will be a few decimal places off, but if we interpret the indefinite repititioin as something/9, we will get the exact answer.)
-You can see that it can be simplified massively.
-It's NOT the same as Galileo's- the (1/2)g×sin θ×t² one. Clearly, this rule-of-thumb is not as smart as Galileo
-It's been 220 years since Lagrange invented this. It is NOT a good rule-of-thumb for the kind of problems today, most of the time.
-That's why it has NOT been used thoroughly in machine learning.
-But it gives a load of insight:
1) Difficulty: -In the last century, mathematicians (and computer scienists) Alan Cobham and Jack Edmonds proposed a theory that a problem must have certain characteristics, related to what is called its Time complexity in order for a computer to be able to solve it easily. That is, the rule-of-thumb(s) to solve the problem(s) have to be easy (fast) and not hard (slow) for computers to follow. They ignited a debate related to the million dollar problem, the P vs NP problem (no, literally, its solution is worth a million dollars: https://en.wikipedia.org/wiki/Millennium_Prize_Problems .
-Actually this problem was earlier discussed in the letters of John Forbes Nash Jr. (you've seen A Beautiful Mind, right?)
-The Laplace interpolation is one way to obtain a polynomial, but there is one other obvious way using linear algebra. We will have to solve a system of N linear equations in N variables to obtain N coefficients.
-But, if we did this using the method we learnt in school, of finding the adjoint and the determinant separately, we would take really, really long for large N.
-This is because the method of finding the determinant that we learnt in school- which was actually invented by Laplace too!- called Laplace expansion, we would be using what Alan Cobham and Jack Edmonds would call hard.
-But thanks to mathematicians, almost all the basic operations of linear algebra we can use today are "easy".
2) Interpolation
-The method is called Lagrange interpolation because, once you have the formula, like the one above, you can substitute values for t that are NOT in the table and guess the value of D(t). (You WILL get a wrong answer if you actually do the experiment and verify this again; Galileo was really, really smart.)
-It led to the study of many more such techniques and now we have many
-A similar technique, also using polynomials, is called spline interpolation
-Spline interpolation is largely responsible for the revolution in computer graphics in the last many decades- due to the use of a concept- Bezier curves in animation
3) Adi Shamir
-Adi Shamir used the basic mathematics behind Lagrange interpolation to create a method to encrypt messages in a scheme now known as Shamir's Secret Sharing
-Yu can find a Python source implementeing that scheme at:https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing -Incidentally, Adi Shamir is more famous for using the fact that NO ONE HAS YET SOLVED THE P VS NP PROBLEM to co-create a method of encrypting messages- known as the RSA scheme
-It is one of the most successful encryption schemes in history
-It requires a powerful quantum computer to read the original messages, given the encrypted versions of a message/messages using RSA